Self-organizing Maps in Web Mining and Semantic Web

نویسندگان

  • Emil Şt. Chifu
  • Ioan Alfred Leţia
چکیده

The nature inspired approaches represent a new trend in computer science in general and in the Semantic Web, due to their scalability and robustness. Neural networks represent one category of nature inspired solutions. The self-organizing map (SOM) is a very popular unsupervised neural network model (Kohonen, et al., 2000). It is a data mining and visualization method for complex high dimensional data sets. In the first part of the chapter, we present how the SOM model can be applied in Web mining, by giving sets of documents as input data space for SOM. The result of applying SOM on a set of documents is a map of documents, which is organized in a meaningful manner so that documents with similar content appear at nearby locations on the twodimensional map display. From the information retrieval point of view, our implemented SOM-based system creates document maps that are readily organized for browsing. A document map also clusters the data, resulting in an approximate model of the data distribution in the high dimensional document space. Some experimental results are included, where a couple of meaningful clusters have been discovered by our system in a subset of the “20 newsgroups” data set (Lang, K., 1995). The clustering capability of our system allows users to find out quickly what is new in a Web site of interest by comparing the clusters obtained from the site at different moments in time. In the rest of the chapter, we focus on how a more complex SOM based unsupervised neural network model is used for enriching a domain ontology. Building complete and reliable domain ontologies is the basis for the success of the Semantic Web. The ontology enrichment process consists in the addition of new concepts which will be attached as hyponyms for the existent nodes of the ontology (Pekar and Staab, 2002). The names of the new concepts are terms represented linguistically by common noun phrases. The enrichment process can also add new instances to existent concepts of the ontology. In this case, the process is also known in the literature as ontology population or named entity classification, where the named entities are represented linguistically by proper names of people, organizations, locations etc. (Cimiano and Völker, 2005). In both cases, the process is algorithmically the same, the only difference being the grammatical category of the linguistic entities to be classified: common noun phrases representing terms for new concepts to be added or proper noun phrases representing named entities, i.e. new instances for the existent 22

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Concept Mining with Self-Organizing Maps for the Semantic Web

In this paper, we discuss problems related to the basic Semantic Web methodologies that are based on predicate logic and related formalisms. We discuss complementary and alternative approaches. In particular, we suggest how the Self-Organizing Map can be a basis for making the Semantic Web more semantic.

متن کامل

Use of Semantic Similarity and Web Usage Mining to Alleviate the Drawbacks of User-Based Collaborative Filtering Recommender Systems

  One of the most famous methods for recommendation is user-based Collaborative Filtering (CF). This system compares active user’s items rating with historical rating records of other users to find similar users and recommending items which seems interesting to these similar users and have not been rated by the active user. As a way of computing recommendations, the ultimate goal of the user-ba...

متن کامل

Self-organizing maps for latent semantic analysis of free-form text in support of public policy analysis

The huge amount of free-form unstructured text in the blogosphere, its increasing rate of production, and its shrinking window of relevance, present serious challenges to the public policy analyst who seeks to take public opinion into account. Most of the tools which address this problem use XML tagging and other Web 3.0 approaches, which do not address the actual content of blog posts and the ...

متن کامل

Interval set clustering of web users using modified Kohonen self-organizing maps based on the properties of rough sets

Web usage mining involves application of data mining techniques to discover usage patterns from the web data. Clustering is one of the important functions in web usage mining. The likelihood of bad or incomplete web usage data is higher than the conventional applications. The clusters and associations in web usage mining do not necessarily have crisp boundaries. Researchers have studied the pos...

متن کامل

Text-Based Ontology Enrichment Using Hierarchical Self-organizing Maps

The success of the Semantic Web research is dependent upon the construction of complete and reliable domain ontologies. In this paper we describe an unsupervised framework for domain ontology enrichment based on mining domain text corpora. Specifically, we enrich the hierarchical backbone of an existing ontology, i.e. its taxonomy, with new domain-specific concepts. The framework is based on an...

متن کامل

AASA: a Method of Automatically Acquiring Semantic Annotations

An important precondition for the success of the Semantic Web is founded on the principle that the content of web pages will be semantically annotated. This paper proposes a method of automatically acquiring semantic annotations (AASA). In the AASA method, we employ a combination of data mining and optimization to acquire semantic annotations. Key features of AASA include combining association ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012